TWO WEIGHT INEQUALITY FOR THE HUBERT TRANSFORM: 
A REAL VARIABLE CHARACTERIZATION 



MICHAEL T. LACEY, ERIC T. SAWYER, CHUN-YEN SHEN, AND IGNACIO URIARTE-TUERO 

Abstract. Let cr and w be locally finite positive Borel measures on R which do not share a 
common point mass. Then, the Hilbert transform H(crf) maps from L 2 (cr) to L 2 (w) if and 
only if H(of) maps L 2 (cr) into weak-L 2 (w), and the dual weak-type inequality holds. This is a 
corollary to a more precise characterization in terms of a Poisson A2 condition on the pair of 
weights, and conditions phrased in terms of testing the norm inequality over any subset of an 
arbitrary interval. 
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1. Introduction 



Let H"vfx) = p.v. L be the Hilbert transform of the measure v. In this definition, the 

v ' r J R y— x 

principal value need not exist, so we always understand that there is some standard truncation 
of the integral in place, and all relevant estimates are assumed to be independent of how the 
truncation is taken. Given weights (i.e. locally bounded positive Borel measures) cr and w on the 
real line R, we consider the following two weight norm inequality for the Hilbert transform, 



(1.1) 



|H(fcr)| 2 dw < !N 2 



|f| 2 da, 



f e-L 2 (oi 
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where N is the best constant in the inequality, uniform over all truncations of the Hilbert transform 
kernel. A question due to Nazarov-Treil-Volberg, see [29], is whether or not (1.1) is equivalent to 
the following necessary conditions. The (half-Poisson) A2 condition 

o-(I). 



HI 



■P(w,I) <A 2i 



where the inequalities are uniform over intervals I, and P(cr, I) is the usual Poisson extension of 
cr evaluated at point in the upper half-plane (x I? |I|), where Xi is the center of I. And the interval 
testing conditions 



|H(licr)| 2 dw < T 2 o(I), 



|H(liw)| 2 do < T 2 w(I),, 



also holding uniformly over all intervals I. The norm inequality above is phrased in a self-dual 
fashion, namely the dual inequality is obtained by interchanging the weights w and cr. Thus, 
the two testing conditions above are dual. The best constants in the two inequalities can be of 
different orders of magnitude. 

In this paper we prove a weaker variant of this conjecture, with the two interval testing conditions 
replaced with stronger testing conditions on bounded functions. 



Theorem 1.3. Let cr and w be locally finite positive Borel measures on the real line 
common point masses. There holds 



with no 



A 1/2 



where the latter constant is the best constant in the inequalities, below. 



H(af l^dw < T; 
H(wgli) 2 do < T; 



MI), 



9\\L 



w(I). 



These hold uniformly over all intervals I, and functions f, g. Note that only the L°° norms of the 
functions enters into the right hand side. 

A characterization in terms of weak-type norms follows. 

Corollary 1.4. Under the hypotheses above, there holds N ~ W, where the latter constant is 
the best constant in the weak-type inequalities 

l|H(<rf)|| l 2 .°°(w) < W||f ||l 2 (o-) ) ||H(wg)||]_2,cx ! ( .) < W||g|| L 2( w ) . 
That N ~ A^ 2 + W is the immediate corollary, using well-known duality properties of Lorentz 

1 /2 

spaces. But in addition, the half-Poisson constant satisfies A{ < W, as follows from inspection 
of the proof of A\ /2 < K in [8, Section 2]. 

The form of our Theorem and Corollary closely match results about positive operators [27], and 
the context of singular integrals, the main results of [21]. The latter paper focuses on the special 
case of a = 1/w with w an A2 weight, and the Hilbert transform is replaced by an arbitrary 
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Calderon-Zygmund operator in any dimension, providing a sharp estimate of the norm of the 
operator in terms of the A 2 constant of w and the testing constant. This result was employed by 
Hytonen, in his resolution of the A 2 conjecture [4]. (Simpler proofs have subsequently been found.) 
Our main theorem bears strong resemblance to the characterizations of two weight inequalities 
found in [7]. 

We remark that our constant Too is comparable to the best constant in the inequalities 



H(crl F ) 2 dw < T 2 a(I) , 



H(wl G ) 2 do < T^w(I), 



where F, G C I, and I is an arbitrary interval. 

In the circumstances in which the two weight problem arises, one would like sufficient conditions 
for the L 2 norm inequality that are as simple as possible, namely the interval testing condition 

1 12. 

above. Verifying that one has < A 2 + 7 := "K appears to require techniques beyond the 
scope of this paper. Also, certain complex variable characterizations of the two weight inequality 
were found by Cotlar-Sadosky, [1]. 

The Nazarov-Treil-Volberg conjecture has only been verified before under additional hypotheses 
on the pair of weights, hypotheses which are not necessary for the two weight inequality. The 
so-called pivotal condition of [29] is not necessary, as was proved in [8]. The pivotal condition is 
still an interesting condition: It is all that is needed to characterize the boundedness of the Hilbert 
transform, together with the Maximal Function in both directions. But, the boundedness of this 
triple of operators is decoupled in the two weight setting [24]. 

Our argument has these attributes. 

(1) Certain degeneracies of the pair of weights must be addressed, the contribution of the 
innovative 2004 paper of Nazarov-Treil-Volberg [17], also see [29], which was further 
sharpened with the property of energy in [8]. This theme is further developed herein. 

(2) Properties of the Hilbert transform must be carefully exploited. This was a key contribution 
of [8], and it is continued here. What was known before is listed in §5. This paper adds 
two additional properties to the list. 

(3) The proof should proceed through the analysis of the bilinear form (H(af),gw), as one 
expects certain paraproducts to appear. Still, the paraproducts have no canonical form, 
suggesting that the proof be highly non-linear in f and g. The non-linear point of view 
was initiated in [9], and is central to this paper. A particular feature of our arguments is a 
repeated appeal to certain quasi-orthogonality arguments, providing (many) simplifications 
over prior arguments. For instance, we never find ourselves constructing auxiliary measures, 
and verifying that they are Carleson, a frequent step in many related arguments. 

(4) Corona decompositions should be recursive. We herein establish the first such decompo- 
sition, called the parallel corona, see Theorem 3.7. The proof of this Theorem depends in 
a critical way on a new property of the Hilbert transform, the functional energy inequality 
of Theorem 7.4. Both of these are of independent interest. 

(5) There is a function theory relevant to non-doubling measure spaces in one dimension. 
An essential intermediate step is to provide (very sharp) sufficient conditions for the two 
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weight inequality in term of testing bounded functions, and functions of minimal bounded 
fluctuation, see Definition 4.5, and Theorem 4.8. 
(6) The testing constant for minimal bounded fluctuation functions is then shown to be 
dominated by the A 2 and the L°° testing constant, an argument that depends a remarkable 
property of minimal bounded fluctuation functions: Their martingale averages have finite 
variation. 

One can phrase a two weight inequality question for any operator T, a question that became 
apparent with the foundational paper of Muckenhoupt [11] on A p weights for the Maximal Func- 
tion. Indeed, the case of Hardy's inequality was quickly resolved by Muckenhoupt [12]. The 
Maximal Function was resolved by one of us [26], and the fractional integrals, and, essential for 
this paper, Poisson integrals [27]. The latter paper established a result which closely paralleled the 
contemporaneous T1 theorem of David and Journe [2]. This connection, fundamental in nature, 
was not fully appreciated until the innovative work of Nazarov-Treil-Volberg [14-16] in develop- 
ing a non-homogeneous theory of singular integrals. The two weight problem for dyadic singular 
integrals was only resolved recently [18]. Partial information about the two weight problem for 
singular integrals [21] was basic to the resolution of the A 2 conjecture [4], and several related 
results [5,6,21,22]. Our result is the first real variable characterization of a two weight inequality 
for a continuous singular integral. 

Interest in the two weight problem for the Hilbert transform arises from its natural occurrence 
in questions related to operator theory [20,25], spectral theory [20], and model spaces [23], and 
analytic function spaces [10]. In the context of operator theory Sarason posed the conjecture 
(See [3].) that the Hilbert transform would be bounded if the pair of weights satisfied the (full) 
Poisson A 2 condition. This was disproved by Nazarov [13]. Advances on these questions have 
been linked to finer understanding of the two weight question, see for instance [19,20], which 
build upon Nazarov's counterexample. 

§2 introduces terminology associated with dyadic grids, and the basic notion of the good and 
bad intervals. Following that, the parallel corona is described. The function theory takes up §4, 
and this section concludes with the proof of Theorem 4.8, modulo the parallel corona. The proof 
of the latter depends critically on the functional energy inequality proved in §7. Estimating the 
minimal bounded fluctuation testing constant is in §6. 

Acknowledgment. The authors benefited from a stimulating conference on two weight inequalities 
at the American Institute of Mathematics, Palo Alto California, in October 2011. 

2. Dyadic Grids and Haar Functions 

2.1. Dyadic Grids. A collection of intervals Q is a grid if for all G, G' G Q, we have G fl G' G 
{0, G, G'}. By a dyadic grid we mean a grid V of intervals of R such that for each interval I G V, 
the subcollection {I' G V : |I'| = |I|} partitions R, aside from endpoints of the intervals. In 
addition, the left and right halves of I, denoted by I-t, are also in V. 

For I G V, the left and right halves I-t are referred to as the children of I. We denote by n^, (I) 
the unique interval in V having I as a child, and we refer to 7Tx>I as the D-parent of I. 
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We will work with subsets FcP. We say that I has T parent 7tjrl = F if F G T is the minimal 
element of T that contains I. The T children of F G T are the maximal F' G T which are strictly 
contained in F. 

2.2. Haar Functions. Let cr be a weight on R, one that does not assign positive mass to any 
endpoint of a dyadic grid V. If I G V is such that cr assigns non-zero weight to both children of 
I, the associated Haar function is 

g(I-MI+) / L I + \ 

o-d) ^ o-U-) tr{L,)) • 

In this definition, we are identifying an interval with its indicator function, and we will do so 
throughout the remainder of the paper. This is an L 2 (cr)-normalized function, and has cr-integral 
zero. For any dyadic interval Io, it holds that {ff(Io) _1 ^ 2 Io}U{h5 r : I G P, I C Io} is an orthogonal 
basis for L 2 (I , cr). 

We will use the notations f(I) = (f, h°) a , as well as 

Aff = (f , h^) a hf = I+E? + f + I_E?_f - IEff . 

The second equality is the familiar martingale difference equality, and so we will refer to Aff as 
a martingale difference. It implies the familiar telescoping identity EjT = . Ia j B°Aff . 

For any function the Haar support of f is the collection {I G V : f (I) + 0}. 

2.3. Good-Bad Decomposition. With a choice of dyadic grid V understood, we say that } G V 
is [e,r)-good if and only if for all intervals I G V with |I| > 2 r+1 |J|, the distance from J to the 
boundary of either child of I is at least |J| e |I| 1_e . 

For f G L 2 (cr) we set Pg God f = Y- iex> A^f. The projection P^ od g is defined similarly. 

I is (e,r)-good 

To make the two reductions below, one must make a random selection of grids, as is detailed 
in [8,29]. The use of random dyadic grids has been a basic tool since the foundational work of 
[14-16]. Important elements of the suppressed construction of random grids are that 

(1) It suffices to consider a single dyadic grid V, but we will sometimes write V° and V w to 
emphasize the role of the two weights. 

(2) For any fixed < e < j, we can choose integer r sufficiently large so that it suffices 
to consider f such that f = Pg 0od f, and likewise for g G L 2 (w). Namely, it suffices to 
estimate the constant below, for arbitrary dyadic grid V, 

(FVf,g) w | <N g00 d||fy|g|| w , 

where it is required that f = P£ ood G L 2 (cr) and g = P^ od G L 2 (w). 

That the functions are good is, at some moments, an essential property. We suppress it in 
notation, however taking care to emphasize in the text those places in which we appeal to the 
property of being good. 
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3. The Parallel Corona 

With the notations of the previous section, the two weight inequality (1.1) is equivalent to 
boundedness of the bilinear form on on L 2 (a) x L 2 (w), 

B (f , g) = (H (fa) , g) w = (h„ (Af f ) , Af g) w . 

Here, and for the remainder of the paper, we use the notation H ff f = H(crf). Also, as L 2 -norms 
predominate, we set \\^>\\ y := ||cj)|| L 2( Y j. 

Definition 3.1. We say that a dyadic interval I G V a is balanced if 

1 < xei 1 < 4 

sup xe i HJ 7 

Note that since hf has cr-mean zero, the infimum is necessarily negative. If I is not balanced, 
it is said to be unbalanced. An unbalanced interval I has children I sma n and I| arge satisfying 

4ff(I S mall) < O-(Iiarge)- 

We say that f G L 2 (a) is balanced if the Haar support of f only consists of balanced intervals. 
The definition of f being unbalanced is at slight variance: We say that f is unbalanced if for all 
intervals I in the Haar support of f we have 

supAff < -4 inf. Aff . 

That is, the 'large value' of the martingale difference is negative. It is clear that any f G L 2 (a) is 
the orthogonal sum of fb + f u ,i _ fu,2> where fb is unbalanced, and f U j are unbalanced. It suffices 
to assume that f is either balanced, or unbalanced, and likewise for g. 

A particular feature of this paper is a careful analysis of the unbalanced functions. This termi- 
nology will help the reader identify these portions of the proofs below. 

Definition 3.2. A collection T of dyadic intervals is G-Carleson if 

(3.3) Y_ °"( F ) < C^cr(S), SgJ. 

FeJ 7 : FcS 

The constant Cjr is referred to as the Carleson norm of J 7 . 

Throughout, we can take Cjr to be a fixed constant. We will work with two functions that 
are supported on an interval Io, that will change repeatedly. Let Lq(Io, a) be functions in L 2 (cr) 
supported on I and have Jj f da = 0. It is very easy to reduce to the case of f and g being of 
integral zero in their respective spaces, and so we always assume this. 

Definition 3.4. For fixed constants Ccz > 4, we call T C V° and non-negative numbers 
{cXf(F) : F G J 7 } Calderon-Zygmund stopping data for f G Lq(Io), with constant Ccz, if these 
properties hold. 

(1) Io is the maximal element of J 7 . 

(2) For all I 6 P B , I C I , we have |Eff| < C C zOCf{n T l) . 
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(3) <x f is monotonic: If F, F' G J and F C F' then oc f (F) > a f (F). 

(4) The collection is cr-Carleson in the sense of (3.3), with constant Ccz- 

(5) We have the inequality 



(3.5) 



< C 



cz 



We will consistently use the notation 
P F ff f := Y_ A i f ■ 

We can fix the constant Ccz t0 be some large fixed number for the remainder of the proof. 
We will very commonly derive sums of the form below, in which Q™ is some family of mutually 
orthogonal projections in L 2 (w). 



Ys*woW*+\\m\*}\\Q*9i 



< 



^WF) 2 a(F) + ||P F a f|| 2 Jx£||Q-g||^ 



FGJ' 



1/2 



< 



kg 



This follows from Cauchy-Schwarz and (3.5). This inequality we will refer to as the quasi- 
orthogonality argument. It is systemic to the proof. 

The simplest way to select the stopping data is to take T to be stopping intervals for the 
weighted averages of f. That is, if f is supported on interval Io, we construct the stopping data 
as follows. We set Io G T , defining ctf(Io) = E^ |f|. Inductively, for F G J, maximal subintervals 
I C F such that Ef|f| > 4E F r |f| is also in 7. We then take ctf(F) = Ef |f |. This is Calderon- 
Zygmund stopping data for f with constant Ccz bounded by an absolute constant. We will refer 
to this as the standard Calderon-Zygmund stopping data. There are however other choices that 
we will appeal to. The intervals T need not be good intervals, and goodness of F will never be 
used in the proof. 

Note that we do not assume that if F, F' G J- ', and F' c F, then ctf(F') > Co<Xf(F). Indeed, for 
some applications, this property will not hold. This missing hypothesis is replaced by properties 
(3.3) and (3.5). 



Let J 7 , otf be Calderon-Zygmund stopping data for f G Lq(Iq, cr), and let Q, <x g be similar data 
for g G Lo(Io,w). Define 

(3.6) B3$(f,g):= £ B(Pff,P£g). 

(F,G)e.Fxe 
7tjrG=F or 7ic; F=G 

This is a subtle definition. It is important to note that for fixed F G J, one can have many G G Q 
with ^-parent F. One should also note that if the stopping data for T is in some sense 'trivial', 
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the bilinear form above is essentially indistinguishable from (H a f, g) w . On the other hand, the 
forms 



Y_ (H a P F *f,P£g> w , FGJ, 



Gee 
tt f G=F 

are much better, in that Ppf is more structured than an arbitrary L 2 (o) function. This is one of 
the main results of this paper. 

Theorem 3.7. It holds that 

B(f,g)-B^(f,g)|^M||f|| a ||g|| w . 

We will refer to the application of this Theorem as the parallel corona. The proof of the 
Theorem depends upon the functional energy inequality in §7. 

Proof. It is worth emphasizing that the presence of stopping data for both functions yields a 
seemingly essential reduction in complexity of the proof. Still, the proof requires some careful 
accounting of terms, with the fundamental fact needed for the proof being the functional energy 
inequality. 

The inner product (H^f, g) w is expanded as 



(H,f, g) w = ^ 2jH ff P F ff f, P£g) w . 

Fe-F Gee 

The term B^g(f, g) is a sum of pairs (F, G) E T x Q where either F is the minimal element of 
T containing G, or the reverse statement holds. The complementary pairs (F, G) are those that 
are either disjoint, or F C G, but G e Q is not minimal with respect to inclusion, or the reverse 
statement holds. The difference between this and (H a f, g) w is then the sum of the terms 

B disjoin t(f) g) . = Y_ (H^f , P£g) w , 

(F.GlG-FxS 
FnG=0 



(F,G)e.Fxg 

7tjrGCF 

and a third form B down (f, g), which is dual the 'up' form, and so we do not explicitly address 
it. (We are dropping the subscripts T and Q, as these will be fixed for the remainder of the 
argument.) 

In the 'disjoint' form, we require Ffl G = 0, so that it is the immediate corollary to (3.10) that 
there holds 

B dis j° int (f,g)|<^||f||.||g|| w . 

It remains to consider the form B up (f, g). In it, we require that njrG c F. 
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It is the consequence of the elementary estimates (3.10) and (3.11) that for the up form it 
suffices to control the bilinear form 

B(f,g):= Y. E^MHJ^Afg)™. 

(IJ) : 7rjr(7tj,J)cl 

Recall that Ij is the child of I that contains J. 
For F e 7, let 

J : 7tj7(7tgJ)=P 

It is routine to check that these functions are ^-adapted. Therefore, the required estimate 
follows from Corollary 7.5. The proof is complete. 

□ 

Remark 3.8. In the relevant literature, a corona refers to a decomposition of the bilinear form 
B(f, g). Here, we are using the same term to indicate passage from a more general term to its 
essential part. 



This section concludes with some standard facts in the subject. Define the collections of pairs 
of intervals 

£:={{!,]) EV° xV w : I n J = or 2~ P |J| < |I| < 2 P |J|}, 
S c :={(I,])eV°xV w : Jcl}, 

and finally let £ D be the collection of pairs of intervals dual to £ c . For the collection £ c , as J c I, 
we have that J is contained in a child Ij of I. 

Lemma 3.9. We have the inequality below, for any integer p > r. 

(3.10) Y_ |<HcA ff f>Afg) w | <; 5C||f||«r||g|| w . 

(i,J)e£ 

The same inequality holds for the term below, and its dual, with £ c replaced by £ D , and the roles 
ofw and a reversed. 

(3.11) Y_ \ E i-i j A i f - (H ff (I-Ij),Ajrg) w 

In this expression, Ij is the child of I containing J. 

This Lemma is implicit in [29]. The paper [9, Section 8] specifically points to this form of the 
Lemma; that section can be taken as a proof. We do not recall the proof. 
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4. A Function Theory 

In the first two sections, we build some function theory, focusing on the role of unbalanced 
Haar functions. We then apply that to the Hilbert transform in the third section. The function 
classes defined here are highly dependent upon the choice of dyadic grid, which is taken to be 
fixed. The main result of this section is Theorem 4.8, providing very sharp sufficient conditions 
for the two weight inequality, though in language specific to a dyadic grid, and testing the bilinear 
form over only functions that are good in that grid. 

4.1. Bounded Fluctuation. The following class of functions extend the notion of being bounded, 
and are basic to our analysis. 

Definition 4.1. A function f is of bounded fluctuation on interval Io, and we write ||f ||b^(i ) < C 
if these two conditions hold. 

(1) f eL§(Io;ff). 

(2) On each dyadic interval I C Io on which f is not constant, it holds that |Eff | < C. 
We then take ||f||b CT (i ) to be the best constant in this last condition, and also set 

iififcfPo) : = \\nl°(i 0) °( l o) + \\nl- 

The point of this definition is that a function f has two important quantities, the first is the 
number ||f ||bf(i )> which is akin to the L°° norm of a bounded function. The second that the norm 
||f|| B ^(i ) is motivated by the quasi-orthogonality considerations basic this paper. We will define 
more of these quasi-norms. 

If we take standard Calderon-Zygmund stopping data for f, and f is balanced, then ||Pff||oo £ 
(Xf(F), which is a final condition for us, in light of our hypotheses in the main theorem. But, iff 
is unbalanced, Ppf is only of bounded fluctuation with ||f||b^(F] ^ 0Cf(F). In both cases, we have 

(4-2) £||P?f||| ?m *||f||*. 

For consistency of notation, let us set ||f ||b°-(i ) to be the infimum of the expressions ^fg^II^IIb^f)' 
subject to the conditions f = ^ Fe jcfF, Pff = fF. and T is cr-Carleson. We have ||f||i2(i ,<r) — 

ll f l|B£(I )- 

A set Q C V is said to be convex if for all Id'c I", if I, I" e Q, then so is I'. A 
projection Q ff = XjeQ associated with a convex subset of V is a difference of two conditional 
expectation operators. In particular, if f is a bounded function, then so is Qf. This proposition 
is elementary, yet useful in our recursive application on the parallel corona. 

Proposition 4.3. Let Q be a convex set, and let Q a denote the corresponding projection. Let f 
have Calderon-Zygmund stopping data T and oc f . These properties hold. 

(1) Iff is balanced, then Pff is a bounded function, and in particular, ||Pff||oo ^ ctf(F)- 
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(2) Iff is unbalanced, the function Ppf is of bounded fluctuation on F, with constant at most 
2CczOf(F)- In particular, /f K is a maximal interval with 

(4.4) |E£Pff | > 2C cz <x f (F) , 

then necessarily for each interval I in the Haar support of Ppf, we /?ave K c I or IflK = 0. 
Moreover tcK /'s /n t/?e Haar support off. Here, Co /s as /n Definition 3.4. 

(3) Sot/? of the assertions hold for QTp f , with constants multiplied by 2. 

Proof. (1) The projection Pp, defined at the end of Definition 3.4, is a convex partition. 
Ppf(x) is supported on F, and at each point x G F, it is the difference 

E£f - E£f , 

where K is the smallest dyadic interval that contains x and has ^-parent F. Both averages 
are by assumption dominated in absolute value by Co(Xf (F) , the key consequence of f being 
balanced. So the conclusion follows. 

(2) Iff is unbalanced, the first part is just part is just as in (1). If K is a maximal interval as 
in (4.4), then the T parent of K can not be F. The conclusions then easily follow. 

(3) Convexity was the only property of Pp used above, so the same conclusion will hold for 
QTff. 

□ 



4.2. Minimal Bounded Fluctuation. Functions of bounded fluctuation are still relatively un- 
structured, which necessitates this central refinement of the notion, namely minimal bounded 
fluctuation. We will refer to the notation established in the definition of balanced Haar functions, 
see Definition 3.1. 

Definition 4.5. We say that f is a minimal bounded fluctuation function, and write f G 
MBF ff (Io), if f is in Bf(Io), and if (1) f is unbalanced, (2) there is a collection /C of disjoint dyadic 
subintervals of I , the supporting intervals off, so that for each K G JC, we have K = (7tK) sma n 
and 

KKk* < o , 

(3) the Haar support of f equals nJC := {nK. : K G JC}, (so the Haar support of f is minimal) 
and (4) for each dyadic interval I not contained in some interval K G /C, it holds that < Eff < 
||f||b^(i )- Note in particular, that this expectation equals a sum of positive quantities: 

Eff= £ EfA^f. 

Ke/C : IC(7tK) large 

We draw this conclusion: Any Haar projection of a function of minimal bounded fluctuation is 
again of minimal bounded fluctuation. 

The need for the definition arises from the case where nK. has substantial overlap. We take 
||f ||bj(i ) to be the best constant y in a decomposition of f = fo + f i G Lq(Io), where ||fo||oo < 7 
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and fi £ MBF£(I ) with ||fj|| b f(i ) <Y. j =0,1. Define 

l|f|| 2 B ? (i ) == llfllw^W + llfll 2 .- 

Keep in mind that the lower case letters in the norms indicate an L°°-like quantity, while capital 
letters indicate an L 2 -like quantity, that must ultimately be combined with a quasi-orthogonality 
argument. 

We need a decomposition of a function of bounded fluctuation into a sum of bounded and 
minimal bounded fluctuation functions. 

Lemma 4.6. For any < e < 1, there is a constant C e so that this holds. Let f £ Bf(Io) be 
unbalanced. There is an orthogonal Haar decomposition off into functions f = 4>o + c|)i + 4>2 so 
that these conditions hold. 

(1) ||<J>2||<x < £ ||f I|b^(i ). (Note the use of the L 2 (a) norm here.) 

(2) The function 4>o is bounded by C e ||f ||bf(i )- 

(3) The function 4>i has Calderon-Zygmund stopping data T\ and <x^ [■) such that for each 
F £ T\, it holds that Pp dpi is of minimal bounded fluctuation and ||Pf((>i ||b CT (i ) ^ a 4>i (H- 

Proof. We can assume that Hf||b CT (i i = 1- Let K, be those intervals for which 



So, these are the dyadic intervals for which the martingale difference associated with the parent 
takes a large negative value on K. Observe that these intervals are pairwise disjoint, for if not, 
then for some interval K £ AC we have that f is not constant on both K and 7rK. Then, 



4 < |E^A- K f| 



$f - E* K f 



< 2. 



And this is a contradiction. 

Then, take 4>i = XIkg/c '■ N° te tna t 4>i satisfies most of the properties of the definition 
of minimal bounded fluctuation, but it certainly need not be of bounded fluctuation itself. Take 
C-\ to be the maximal intervals in the collection nK, := {nK. : K £ K). (Keep in mind that we 
need this Lemma in the case that C-\ is much smaller than nK,.) 

Then, for L £ £i , the function (J)i ■ L has standard Calderon-Zygmund stopping data J 7 !^, and 
04>i,l(")- It is clear from the construction that each projection Pf (cj>i ■ L) is a function of minimal 
bounded fluctuation, for F £ J 7 ^. 

We can complete this to (non-standard) Calderon-Zygmund stopping data for 4^. Take F-\ := 

{Io}U UlgA anc ' set a *i (1°) := ®' anc ' otnerw i se takes the obvious value from the standard 
Calderon-Zygmund stopping data. It is evident from the construction that the third conclusion 
of the Lemma holds. 

Let 4>q := f — dpi . This is exactly the sort of function to which the conclusion of Lemma 4.7 
applies. Namely, 4>o ' s a H aar projection of f onto those martingale differences with L°°-norm 
bounded by four. Hence, 4>o nas exponential moments. 
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Choosing C e ~ log 1 /e, set 4>o = + 4>2 , where 
cfjo := Y- A ^o • 

I:I<£{M(f>£>C e } 

Then, it follows that fyo has L°°-norm at most 4+ C e . Meanwhile, H^lla ^ eo"(Io) 1/ ^ 2 , which 
completes the proof. 

□ 

There is a non-homogeneous variant of the the John-Nirenberg estimate. 

Lemma 4.7. Suppose that f £ B^(I ), and f /s any Haar projection off for which ||Aff ||oo < 
4||f||b CT (i ), for a// 1. Tnen, w/e nave tne distributional estimate 

a(|f] > CA||f|| bf(Io} ) < exp(-A)a(I ) , A > 1 , 
where C is an absolute constant. 

Proof. Assume that ||f||b^(i ) = 1- The fundamental properties are that (1) the martingale 
differences of f are bounded by 4, (2) the bounded fluctuation property of f , and (3) the universal 
weak-type iJ(o') for Haar projections. 

We will show that there is an absolute constant C > 4 so that for A > 1 , 

a(|f] >A + C) <e~ 1 ffflfj >A). 

A standard induction on A then completes the proof. 

Let Xa be the maximal subintervals of L3 such that jEffj > A. It follows that the expectation 
cannot be more than A + 4 is absolute value, by (1) above. Therefore, 

a(|f| > A + C) < J|(r({x6 I : |f-Eff| > C-4}). 

If the set on the right is not empty, it follows that Ef |f| < 1 by (2) above. Now, we are only 
interested in the function f — Eff on the interval I, and it is a Haar projection of flj. This 
operation is weakly bounded on V(I,o), which is (3) above. Hence, 

<x(|f-E?fj > C-4) < (C-4)- 1 ||fl I || L , (w) < (C-4)-V(I) . 

This clearly completes the proof. □ 

4.3. Sufficient Conditions for the Two Weight Inequality. This is very nearly a characteri- 
zation of the two weight inequality, and an essential intermediate step in our goal. 

Theorem 4.8. Suppose that the pair of weights (cr, w) do not share common point mass, and 
fix a dyadic grid V. Consider the two weight inequality 

(4.9) (H ff f,g) w | <K good ||f|| ff ||g|| w , 
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where it is required that f = Pg 0od f G L 2 (rj) and g = P™ ood g G L 2 (w). It holds that N good < 
IK + H3 , where the latter constant is the best constant in the inequalities below, for I G V 



|H ff f| 2 dw < IB 2 



|H w g| 2 duK'Bj 



2 

B»(I) 



Note that goodness depends upon the grid, which is taken fixed in (4.9). Certainly the testing 
conditions above are necessary for the boundedness of the bilinear form. We have not sought 
to show that the A 2 condition is necessary from the condition (4.9). But, it is a consequence 
of the random grid construction that IN < EpD\f goo d, where the expectation over grids is taken 
appropriately, for details on this see [8,29]. 

The Theorem is a consequence of the parallel corona. Using our definition of the function 
classes above, define several constants INy, < i < j < 2 to be the best constant in the 
inequalities 

(H a f,g) w | <NiJf|| B * ( i)||g|| w , geBf(I), f isgood, 

(H„f, g) w | < ||f|U|g||B r( i) , f G Bf (I) , g is good . 

Note that the right hand side places an L 2 norm on the function in the larger function class. An 
important consequence of the parallel corona is 

Theorem 4.10. Let 1 < i < j < 2. There holds 




i = j 



This gives the following inequalities, by recursive application. 

Ngood ^2,2 <K + N 1)2 

< IK + 1^0,2+?^ 

< IK + N , 2 + N 0l i ~ K + K ,2 

The conditions N 0) 2 are the testing hypothesis on functions in the classes Bq(I) and Bq'(I), and 
trivially, Dsf 0) i < IN" ,2 ■ m §6 we will show that testing over MBF functions is controlled by the the 
A2 and the L°° testing constant, completing our proof of the main Theorem. 

Proof. The basic tools here are (4.2), the parallel corona, and the more complicated Lemma 4.6. 
First, an easy case, namely A/2,2 ^ K + IN" 12 - Indeed, taking f G L 2 (cr) and g G L 2 (w), we can 
take Calderon-Zygmund stopping data T and cXf(-) for f and Q and oc g [-) for g. Then, applying 
the parallel corona leads immediately to 

KH ff f,g> w |^5C||fyg|| w + |B^ r (f,g)|. 

Recall that the 'near' form is defined in (3.6). Since Pff G B^(F), for each F G T , we have 

<^1,2^||Pp ff f||B^) 



FG.F Geg : tt^G=F 



L Pg9 

GG5 : 7tjrG=F 
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where the last line follows from quasi-orthogonality (4.2). This completes the proof. 

The arguments involving Hy with i or j equal to one are more complicated, due to the statement 
of Lemma 4.6, and in particular, the fact that the absorption term associated with 4> 2 involves 
L 2 (cr). For the sake of specificity, let us argue that N 1(2 < "K + N^i + No,2- All other cases are 
similar. 

Select interval Io, and f, g so that 
(4.11) |JV 1)2 ||f||B ?( i o} ||g||w< (H ff f,g) w . 

Applying Lemma 4.6 to f, with a choice of e to be specified, we have f = f + f i + f 2 , where 
f is a function bounded by C e , and f] have non-trivial Calderon-Zygmund stopping data T\ and 
<x fl (■), so that 

l|f|| 2 B ?( i ,^Lll p ^ll 2 B?m- 

Fe.Fi 

For f 2 , there holds 1 1 "f 2 1 1 cr < e||f ||b^(i )- O n the other hand, we take standard Calderon-Zygmund 
stopping data Q and ot g {-) for g. 

Note that for the terms that involve f 2 and g we have 

KH ff f 2 ,g) w | < eN 2)2 ||f 2 || ff ||g|| w ^e{J{ + N 1(2 }||f|| a ||g|| w 

by the first step in the proof. Thus, this term can be absorbed into the left hand side of (4.11), 
for e sufficiently small. 

The main term is then (H a (fo + fi ), g) w - For the function fo, we just use the L°°-bound, and 
the constant No )2 . 

KH ff f ,g) w | < C e X , 2 o-(I ) 1/2 ||g|| w . 

That leaves the function f|. We have stopping data for both fi and g, so that repeating the 
argument involving the parallel corona and the quasi-orthogonality, we see that 

I (H a f, , g) w | < {<K + K 0)2 + N^^Hf NefcioJ ll9lU • 
Collecting estimates, we have shown that 

|X 2)1 < ~K + eK 2>1 + C e K 0)2 + . 

As this holds for all < e < 1 , this completes the proof of N 2) i < ( K-\- N^i . 
The estimate for ] is similar to those presented. 

□ 
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5. Energy, Monotonicity, and Poisson 



Our Theorem is particular to the Hilbert transform, and so depends upon special properties 
of it. They largely extend from the fact that the derivative of — 1/y is positive. The following 
Monotonicity Property for the Hilbert transform was observed in [9, Lemma 5.8], and is basic to 
the analysis of the functional energy inequality. We will frequently use the notation J m I to mean 
J C I and 2 r |J| < |I|, so that the property of J being good becomes available to us. 

Lemma 5.1 (Monotonicity Property). Suppose that y is a signed measure, and \i is a positive 
measure with \i > \y\, both supported outside an interval I E V a . Then, for good J g I, and 
function g e Lq(J,w), it holds that 



Here, g = ^j/|g(J')|h-j^ is a Haar multiplier applied to g. 

Proof. By linearity, it suffices to consider the case of g(x) = hr'(x). Namely, we should prove 



The function H|j. is monotonically increasing on I, and we have defined the Haar functions so 
that the the inner product (Hfj., hp w is positive, as is (x, ftp™. So the right hand side above is 
non-negative. 

The Haar function is also constant on both halves of J, and has w-integral zero. Thus, 
there is a monotonic increasing map cf) : J + — > J_ so that 



The Haar function is positive on J + . Under the assumption that \y\ < |x, we make the difference 
on the right both bigger in absolute value, and positive, by replacing Hv with H(x. This proves 
the first inequality. 

To compare to the Poisson integral, we examine the inner product (H[x, h.^)™. Let us write, 
for x G J and y gR-I, and Xj the center of J, 



(5.2) |(Hv,g) w | < (H^g) w «P(J,n)( m ,g) 



|<Hv,h}%|< (H^hf) w ^P(J,^(^,h| v ) w . 



(Rv, nr) 



(-Hv(x) + Hv(4>(x))Hj?(x) dw 



y-x {y -xj) - (x-xj) 




Therefore, it follows that 



H^xj),hf) 



w 
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L 

k=1 



I- 1 



(Xj 



(y-xj) 



k+1 J 



HHx) dw(x)d^-y) 



Recall that the condition that J be good implies that dist(3I, J) > 2' 1 



The term k = 1 



is 



c p(Hj)(-,h' 



where |c — 1 1 < 2 (e 1 ' r . All of the higher order terms are geometrically less than this. For k > 2, 
note that we will always have 



i-i (y-*j) k+1 ^ 

And critically, by examination 



< 2 (e-1)r(k-1) 



-i (y - xj) 



x- 



Hf(x] 



x e J, k > 2 



□ 



The concept of energy is fundamental to the subject. For interval I, define 



E(w,I) z :=Er a X 



w(dx) w w(dx') (X — X') 



'12 



HI 



Now, consider the energy constant, the smallest constant £ such that this condition holds, as 
presented or in its dual formulation. For all intervals I , all partitions V of I, it holds that 



(5.3) Y_ P ( ffI °> ^ £2ff ( 1 ^ • 



lev 



It is was shown in [8, Proposition 2.11], that £ < A{ +7 = JC. We will always estimate £ by 
'K. 

One should keep in mind that the concept of energy is related to the tails of the Hilbert 
transform. The energy inequality, and its multi-scale extension to the functional energy inequality, 
show that the control of the tails is very subtle in this problem. 

We also need the following elementary Poisson estimate from [29]; used occasionally in this 
argument, it is crucial to the proofs of Lemma 3.9 and (3.11). 

Lemma 5.4. Suppose that J @ I c I, and that J is good. Then 
(5.5) |J| 2e - 1 P(a(T-I)J)^|I| 2e - 1 P(a(I-I),I). 

Proof. We have dist(T,T- I) > |J| e |I| 1_e , so that for any x€ I-I, we have 



1 



dist(x,J)p 



< 



n2e 



1 



dist(x,I)) 2 
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Hence, we have 



|J| 2e - 1 P(a.(T-I)J) = |J| 



2e-1 



IJI 




I-I 



(|J|+dist(x,J)) 



2e 



da. 



i-i 



(|J|+dist(x,T)) 2 



And this proves the inequality. 



□ 



6. Testing Minimal Bounded Fluctuation Functions 



In Theorem 4.8, we have reduced the two weight inequality to testing of functions in the 
class Bq(Io), and the dual collection, in any fixed dyadic grid V. These functions are the linear 
combinations of bounded functions and those of minimal bounded fluctuation. In this section, we 
show that testing of functions of minimal bounded fluctuation follows from the A 2 condition and 
testing of bounded functions. As a corollary, testing bounded functions is sufficient for the two 
weight inequality, which is the main result of this paper. 



Theorem 6.1. The inequality below and its dual hold. For all intervals I and f £ Bq(Io) fl 



The very specific structure of functions in the class MBF ff is essential to this argument, and 
even still, the argument is intricate. An outline of the arguments is as follows. 

(1) There are some initial general steps, the first being the classical above and below splitting, 
common to most Tl-type arguments. 

(2) The second is a corona argument, another consequence of functional energy. This allows 
essential simplifications. One that we have not seen to date is the use of the energy 
corona, which smooths out certain irregularities of the pair of weights. This argument 
originates in [17], but is herein implemented under conditions which are necessary from 
the Az and testing hypotheses. 

(3) The third is the passage to the stopping terms, the name coming from [17,29], see (6.11). 
This argument has been used before, especially the proof of Corollary 7.5. There however, 
one was 'stopping' at intervals in Calderon-Zygmund stopping data. Now, the 'stopping' 
intervals are not sparse, and our difficulties multiply. 

(4) After this, the argument comes in two parts, depending upon the position of the MBF 
function. 

The general steps begin with the above and below splitting of the bilinear form (H^f, g) w . 
Corollary 6.2. Let f £ L^(I , a) and g £ L^(I , w). It holds that 



H^dw^^ + OUHfll 2 



(H 



ff f, g) w - {B ab ° ve (f, g) + B below (f, g)} < M||f|| ff ||g|| 



w 
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where 

B above (f,g) := Y_ T_ EijAff • (H ff Ij, Afg) w 

IClo J : Jal 

and B below (f, g) is defined in the dual fashion. 

This is a corollary to the uncomplicated estimates (3.10) and (3.11). It is perhaps a useful 
remark that all prior arguments on this question have made the step above at an early stage of 
the argument. Our main theorem cannot be proved in this fashion, hence our use of the parallel 
corona. 

To establish Theorem 6.1, it suffices to establish this proposition. Note that the L°° testing 
constant appears in the first estimate. 

Proposition 6.3. The two inequalities below, and their duals, hold. For all intervals I , 



(6.4) B above (f,g) ^{4 /2 + 7 oo }||f||,||g||BW(i 0) , g e MBF W (I 



oJ > 



(6.5) B above (f,g) <M||f|| B . (Io) ||g|| w , f e MBP(I 



0J 



In the form B above (f, g), we will say that f is in the up position, and g is in the down position. 
Our proof splits according to the position of the MBF function. 

We need a notion of a corona. Let JF ', (Xf(-) be Calderon-Zygmund stopping data for f. Set Pf 
as in Definition 3.4, and set Q^g := Y.j ■ n T }=i ^ e tne same projection into L 2 (w). Define 

B^ bove (f, g) := ^ B above (P F a f, Q^g) . 

The corona estimate is 
Theorem 6.6. There holds 

B above (f,g)-B a , bove (f,g)|^!K||f|| (r || g || w . 

It is an important complication that we do not know of any variant of this estimate for stopping 
data on the down function. As a consequence, in order to use the corona estimate, it is required 
that we have the L 2 (w) norm on g. 

Proof. We apply functional energy. Expand 

B above (f, g ) = Y_ X. Babove ( p pf> Q?g) 

In the sum above, we can also add the restriction that F D F. The case of F' = F is the definition 
of B3r bove (f, g), so that it suffices to estimate 



Y_ B above (P F a ,f,Q^g) 



F'2F 
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Observe that the functions 
g F := Y_ A 79 

J: JeF 

are T<z adapted in the sense of Definition 7.1. Therefore, the Theorem is the corollary to Corol- 
lary 7.5. The proof is complete. 

□ 



There is a general construction, the energy corona, which structures the pair of weights in a 
significant way. 

Definition 6.7. Given interval Ii C Io, define ^energy Ui) to be the maximal subintervals I c I, 
such that there is a partition J{\) of I into intervals good intervals J <s I with 

(6.8) Y. P(o-Ii,J) 2 E(w,J) 2 w(J)>10£ 2 a(I). 

Here, £ is the constant in (5.3), and it holds that £ < "K. 

We then set J 7 := {J^Lo^n, where JFq := {Io}, and inductively set J^+i := Ui&F n -^energy (I)- 
These are the energy stopping intervals. 

The fact that the energy inequality (5.3) holds shows that T is cr-Carleson. Let .M,ctf(-) be 
standard Calderon-Zygmund stopping data for f, and then extend <x f to T by setting ctf(F) = 
(Xf(7ly^F). Then, FVJM., with the extended function tx f is Calderon-Zygmund stopping data for 
f. We will refer to this as the energy stopping data for f. (It is this step that requires our general 
definition of stopping data.) 

This definition will be useful to summarize a frequent hypothesis in lemmas below. 

Definition 6.9. Say that f G Bq(Io) is energy-regular if the condition that I is contained in some 
I' G -FenergyUo) implies that f(I) = 0. 

By the construction of the energy corona, and a straight forward application of Theorem 6.6, 
it suffices to assume that f is energy-regular below. 

Remark 6.10. The considerations above have their origins in [17,29], although they are used here 
with the necessary energy inequality for the first time. 

The stopping term will be the main focus of attention. Let f G Bq(Iq), with ||f||bj(i ) = 1- 
Further assume, as will be the case below, that if K is an interval on which f takes constant value 
greater than one in absolute value, then g is constant on that interval. Write the argument of 
the Hilbert transform as Ij = I — (Io — Ij). This yields a decomposition of the 'above' form into 
two, the first is 

I : ICI J : Jel 



(FLjIo, g) w 
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lV2||„ll 



^5C||f|| b - (Io) CT(I 



0, 



by the fact that f is of bounded fluctuation, and the interval testing condition. The second term 
is the stopping term: 

(6.11) B st °P(f,g):= £ £ E^Aff ■ (HJI - Ij), Afg) w 

I : IClo J : J<=I 

We will make the standing assumption that g = ^,j|g(J)|Hj\ which maximizes the relevant inner 
products in the stopping term. (Compare this to the proof of Corollary 7.5.) 

6.1. Minimal Bounded Fluctuation in the Up Position. We concern ourselves with the proof 
of the estimate (6.5), under the assumption that f is energy-regular. 

With an arbitrary L 2 function in the down position, we have only a small number of tools to 
control the stopping term. And so the proof turns on the very strong 'positivity' property of MBF 
functions noted already in Definition 4.5. 

Take f G MBF°"(I ), with ||f||bj(i ) = 1- supporting intervals /C, which are the pairwise disjoint 
intervals on which A£ K f takes a negative value. Moreover the Haar support of f equals 7tK := 
{ttK : Kg/C}. 

First, we can assume that g is constant on each interval K G /C. To see this, set g K := 
^j.j iaK Aj v g. Using Lemma 5.1, 

|B-P(f,g K )|<P(a,K)(A, gK ) w . 

The intervals /C are pairwise disjoint, so that the sum over K G /C of this last expression is 
controlled by Cauchy-Schwarz and the energy inequality (5.3). We can therefore assume that 
g K = for all K, proving our claim. 

Second, the essential point is then this positivity property: For each interval I G 7rK, and } in 
the Haar support of g, we have E^Aff > 0, and moreover 

< Y_ Aff < 1 . 

I:I3J 

Let V := {(I, J) : J € I, f(I) * 0, g(J) * 0}. Given V C V , make these definitions. The 
first is a restriction of the stopping term, define 

K op (f,g):= Y_ E Il A^f-(H (7 Io-Ij,Afg) w . 
Also, define a notion of size as follows. 



size(P) := sup Y_ K A i f > 



h :IlClo I:9(IJ)eP 

Jci,ci 

It is clear that s\ze{Vo) < 1. The main Lemma is as follows. 
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Lemma 6.12. A collection V c Vo can be decomposed into Vb lg U V sma \\ so that 

Br(f,g)<size(Wa(I ) 1/2 ||g|| w , 
size("P sma n) < fsize(P) . 

It is clear that this Lemma concludes the proof. For starting at the collection Vo, we can apply 
it recursively to gain a decomposition Vo = U m >i with 

Bp: P (f,g)<(!r^o-(Io) 1/2 ||g|| w . 
This is clearly summable in m > 1 to the bound we want. 

Proof. Let t = s\ze{V). The main recursion consists of constructing a collection X of disjoint 
intervals. Initialize Q <— V, V b]g <- 0, and X <— 0. While it holds that 

size(Q) > |t, 

consider intervals Ii which satisfy 

^ Ef ; Aff>|T. 

I:3(IJ)eP 
JCliCl 

Take Ii maximal with respect to inclusion, and from these, select one with a(Ii) maximal. Then, 
define V h :={(I, J) G Q : I D Ii D }}, and update 

X^XU{l!}, P big <- P big U Pi, , Q<-Q-V h . 

The intervals in X are pairwise disjoint. For contradiction, assume that I 2 Q I, are both in X. 
Then I] was selected for membership in X first, and all pairs (I, J) with J C Ii C I were added to 
V h . Hence, for all (I, J) G V\ 2 , it holds that IQ Ii. By definition, 

2 

Y_ L E5Aff<T, 

V=1 I:3(IJ)ePi v 
JClvCl 

yet for both v = 1,2 the second sum exceeds |t, which is a contradiction. 

We use the monotonicity property (5.1), and the energy-regularity off, whence (6.8) fails that 
for each I e X with f(I) + 0. These facts give us 

where gi := . j cl A^g. Disjointness of the intervals I and Cauchy-Schwarz conclude the proof 
of the Lemma. □ 

Remark 6.13. In order to verify the Nazarov-Treil-Volberg conjecture, one should show that, if 
f G Bq(Iq), and f is energy-regular, then 

B sto P(f,g)|<M||f|| B . (Io) ||g|| w . 

But, in the argument above, we have relied upon the fact that the partial sums of the martingale 
differences of an MBF function have finite variation. A bounded function need not even have 
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martingale differences with finite quadratic variation, so there is no clear generalization of the 
proof above. 

6.2. Minimal Bounded Fluctuation in the Down Position. The proof of the estimate (6.4) 
is reduced to L°° testing and the already established estimate (6.5). The corona is essential. The 
function f in the up position is assumed to be in Lq(I ). By first applying Theorem 6.6 with 
standard Calderon-Zygmund stopping data for f, it follows that it is sufficient to prove 

B above (f,g)| < W||f||B f [io)||g|| w , 9 e MBF W (I ). 

The L 2 (cr) norm has been replaced by Bf(Io), and the Bq'(Io) norm has been replaced by the 
smaller L 2 (w) norm. By using the decomposition of Lemma 4.6, it suffices to consider the 
inequality above for f in the smaller class Bq(Iq), with the corresponding norm on the right hand 
side. Now, the class Bq(Iq) is the finite linear combination of bounded functions and functions in 
the class MBF tT (I ). The case of a minimal bounded function in the up position is the estimate of 
(6.5). Therefore, the remaining estimate to prove is as below, in which we invoke the L°° testing 
constant. 

Lemma 6.14. For an interval I , functions f G L°°(cr, I ) and g G MBF W (I ), there holds 
B above (f, g) < {Al 1 + T oo }||f|| oo cr(I ) 1/2 ||g|| w . 



Proof. Note that we have by assumption on L°° testing, 

(H (T f,g) w |<T oo ||f|| oo o-(I ) 1 / 2 ||g|| w . 
Starting from this inequality, we also have, by Corollary 6.2, 

(H ff f, g) w - {B above (f, g) + B below (f, g)}| < W||f||„ 

The right hand side is smaller than our first estimate. Moreover, the assumption that g G 
MBF w (Io), and the estimate (6.5) for when the minimal bounded fluctuation term in in the up 
position give 

(6.15) |B below (f,g)|<^l|f|UI|g||B^i o) . 

we are finished. If not, we take standard Calderon-Zygmund 
By the corona estimate Theorem 6.6 in its dual form, we then 



If there holds ||g|| B -(i ) £ iiyih 
stopping data for g, Q and ot g { 
have 

> below < 



B below (f,g) ^^||f|UI|g|| w + ^|B 



below 

Q 



(f,g) 



The estimate (6.15) applies to each term B below (Q£f, P^g), for G G Q. Quasi-orthogonality then 
completes the proof. □ 
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7. The Functional Energy Inequality 
We state an important multi-scale extension of the energy inequality (5.3). 

Definition 7.1. Let J 7 be a collection of dyadic intervals, and for each F G J. A collection of 
functions {grj^jr in L 2 (w) is said to be ^-adapted if 

(1) The functions E Lq(F,w). 

(2) Letting J (F) = {J : g^( J) ^ 0}, these collections are pairwise disjoint in F £ T . 

(3) For all J £ J{T) it holds that J m F. 

(4) There is a finite number p so that for all intervals I the collection 
(7.2) /C(I) := {J c I : J is maximal in J{¥) for some F D 1} 

has bounded overlaps: Y-]eic(i) JM < P- 
Define JR C similarly, with condition (3) replaced by 
(3') For all J £ J{V) it holds that J C F. 

Concerning this definition, a natural choice for J[Y) would be {J <s F : 7tjrJ = F}, and one can 
check that this meets the definition above for p = r, the integer in the definition of J <e F. The 
more general definition will permit us to give a shorter proof of the parallel corona. We never need 
p more than a fixed constant, so we suppress the dependence of our estimates on this number. 

Definition 7.3. Let 5F be the smallest constant in the inequality below, or its dual form. The 
inequality holds for all non-negative h £ L 2 (cr), all o-Carleson collections T , and all J^-adapted 
collections 



1/2 

Y_ £ p(ha,r)\(^9Tr) w \<mu Liig^ 



1 

w 



Here J* (F) consists of the maximal intervals J in the collection J7(F). Note that the estimate 
is universal in h and J 7 , separately. 

This constant was identified in [9], and is herein shown to be necessary from the A 2 and interval 
testing inequalities. 

Theorem 7.4. There holds the inequality J < "K. 

The first step in the proof is the domination of the constant £F by the best constant in a certain 
two weight inequality for the Poisson operator, with the weights being determined by w and o in 
a particular way. This is the decisive step, since there is a two weight inequality for the Poisson 
operator proved by one of us. It reduces the full norm inequality to simpler testing conditions, 
which are in turn controlled by the A 2 and Hilbert transform testing conditions. 

The way that the functional inequality is used is as in the following more technical corollary, 
which uses the weaker definition of ^-adapted. 
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Corollary 7.5. Let f e L 2 (cr) have Calderon-Zygmund stopping data T and <x f (-)- For all 
^-adapted functions {g F }, there holds 



^^E^f-(H a I F ,g F >, 

FG.F I : IaF 

The dual inequality also holds. 



LiigFii 



1/2 



The proof of 5F < J{ is taken up in the next few subsections, and then the proof of the corollary 
concludes this section. 

7.1. The Two Weight Poisson Inequality. Consider the weight 



FGJPJ£J*(F) 



(xj.lll) 



Here, P™j := Xj'ejfF) ■ j'cj ^ e can re P' ace x by x — c for any choice of c we wish; the 
projection is unchanged. And 5 q denotes a Dirac unit mass at a point q in the upper half plane 

R 2 . 

We prove the two-weight inequality for the Poisson integral: 

||P(Ha)|| L2(Ri ^^J{||H|| a , 

for all nonnegative h. Above, P(-) denotes the Poisson extension to the upper half-plane, so that 
in particular 

||P(rur)|| 2 



r F,Jm 



where xj is the center of the interval J. The proof of Theorem 7.4 follows by duality. 

Phrasing things in this way brings a significant advantage: The characterization of the two- 
weight inequality for the Poisson operator, [27], reduces the full norm inequality above to these 
testing inequalities. For any dyadic interval I E V 



(7.6) 
(7.7) 



>• I) 2 d^(x,t) <W 2 ct(I), 



'(tIu.) 2 o-(dx) <A 2 



t 2 u-(dx, dt), 



where I = I x [0, |I|] is the box over I in the upper half-plane, and P* is the dual Poisson operator 

t 2 



P*(tlu.) 



■u.(dy,dt) . 



Jtt 2 +|x--y| 2 ' 

One should keep in mind that the intervals I are restricted to be in our fixed dyadic grid, a 
reduction allowed as the integrations on the left in (7.6) and (7.7) are done over the entire space, 
either R 2 or R. (Goodness of the intervals I above is not needed.) This reduction is critical to 
the analysis below. 
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Remark 7.8. A gap in the proof of the Poisson inequality at [27, Page 542] can be fixed as in [28] 
or [7]. 

7.2. The Poisson Testing Inequality: The Core. This subsection is concerned with this part 
of inequality (7.6): Restrict the integral on the left to the set I C R. 2 

P(ct- I) 2 d^(x,t) < ftff(I) . 
Since (xj, |J|) G I if and only if J C I, we have 



>• I) (x,t) d|x(x,t) 



< 



Y Y p ( ff,I )( x J» 

feTjeJ*{T): Jci 

Y y p^-ur 

FeJ-jeJ-*(F): jci 



pw 



pw 



For each J, 
(7.9) 





2 




x-Epc 


pw X 


< 

w 




I 


IJI 



dw (x) = 2E (w, J) w(J) < 2w( J) 



A straight forward estimation is not possible, because the intervals J overlap. The intervals T 
obey a cr-Carleson measure condition, which we exploit in the first stage of the proof. We 'create 
some holes' by restricting the support of cr to the interval I in the sum below. 



Y_ Y P((Fni)o-j) 2 |p^- 

JCI 



= { Y + Y } Y P((FnI)a,J) ; 

FCI FeJ^iF^I ]ej*[f): JCI 

= A + B. 
The first of these terms is at most 

A< Y Y P(Fa,J) 2 EKJ) 2 Mj) 

FG-F: FCI JGJ*(F) 
FeJ 7 : FCI 

Here we have used (7.9), the energy inequality (5.3), and that the stopping intervals T satisfy a 
a-Carleson measure estimate property (2) of Definition 3.4. 

Concerning the second term, this is the point that the element of the definition (7.2) enters into 
the proof. We can write the collection /C(I) of (7.2) as the union of collections /C k , 1 < k < p, 
where the intervals in each K,\ are a subpartition of I. Using (7.9) and the energy inequality, term 
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B satisfies 



k=l JG/C k 



X 



F JJ| 



< ^ ^ P (a ■ I, J) 2 E (w, J) 2 w(J) < p^ 2 a(I) . 

k=l ]eK k 

In the first line, Fj is the unique F G 7 with J G J*{¥) and F D I. 
It remains then to show the following inequality with 'holes': 

Y_ P(o-(I-F),J) 2 P™ * 2 <J£ 2 ct(I), 

where T\ consists of those F G J with F C I. Our purpose is to pass back to the Hilbert transform, 
so that we can effectively use the testing condition. The inequality above can be expressed in 
dual language as the inequality 

Y_ Y P(o-(I-F),J)(p-^,g) w <Ma(I) 1 / 2 ||g|| w . 

In the inner product, g G L 2 (w) can be replaced by g F j := P^g by self-adjointness of the 
projections. Also, (x, hj v ) w > 0, so that we are free to assume that (g,hj v ) w > for all }. 
We can estimate, using the monotonicity property (5.2), 



P - F), J) g FJ ^ » (H ff (I - F) , g FJ ) w , J G J(Y) . 

It therefore suffices to show that 

(7.10) Y_ Y (H a (I — F),g F)J ) w ^ !KCT(I) 1/2 ||g|| w . 

F6.Fi Je J(¥) 

Use linearity in the argument of the Hilbert transform, which gives two terms. The first is 

Y_ Y. ( H ^9fj) w =|<H ff I,g) w |<5Cff(I) 1 / 2 ||g|| w . 
The second term appeals to interval testing and the cr-Carleson measure condition (3.3). 



Y Y < H - F > si=j>> 

FG^i Je J(¥) 



< 



Y\( H ^ Y sfj 



<^Y^ V2 Y sfj 

FeJ^i jeJ(F) 



Y°(T)*Y Y IIsfjII; 

FeJ^i FeJ, jeJ{F) 



1/2 
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< JCa(I) 1/2 ||g|| w . 

We also use the orthogonality of the functions g^j. This completes the proof of (7.10). 

All the remaining estimates use the orthogonality of gFj and the A2 condition. The details are 
below. 



7.3. The Poisson Testing Inequality: The Remainder. Now we turn to proving the following 
estimate for the global part of the first testing condition (7.6): 



Decomposing the integral on the left into four terms: With Fj the unique F G J with J e J7*(F) 



[a ■ I) du. 



J: (xj.UDeRi-T 



x 



FjJl 



pw X 



J:jn31=0 J:JC3I-I J : Jnl=0 J:JSI. 
IJI<|I| IJI>|I| 

= A + B + C + D. 

Decompose term A according to the length of J and its distance from I, and then use (7.9) to 
obtain: 



A < 



00 00 / 1- 



n=0 k=l j : Jc3 k+, I-3 k I 

IJI=2-^|I| 
00 00 i T |2 



Vdist (J, I) 



rff(I) W(J) 



< 



< 



Lr2t>L „, g (I)w(3^I-3M) g(1) 



n=0 

00 00 



i3Mr 



£ 2 - 2 -f 3- 2 ^ a(3k+ll)w(3,C+11 



n=0 k=1 



i3Mr 



o-(I) <,A 2 u{\). 



Decompose term B according to the length of J and then use the Poisson inequality (5.5), 
available to use because of goodness of intervals J. We then obtain 



b < y_ Y- 2 ~ n 



-n(2-4e) 



w(J) 



n=0 J : JC3I-I 
111=2-^111 



< 



n=0 
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For term C, split the sum according to whether or not I intersects the triple of J: 



C < 



y + y Xfjwr 
Z- r ldist(j,i)2 



J: In3J=0 J: IC3J-J 

IJMII IJI>|I| 



Kf j.JitT 



= d + C 2 . 

To estimate Ci, let {BJ°^ be the maximal intervals in the collection of triples 

{3J:|J|>|I| and 3jnl = 0}, 

arranged in order of increasing side length. These intervals are not disjoint, but have bounded 
overlap, YJ±L\ < 3. Group the intervals J by the inclusion 3J C B t , appeal to P™jX = 
P™j (x — c (BO), and use the mutual orthogonality of the P^: 



C, < 



< 



< 



< 



< 



(I) 



i=1 J: 3JCBi 
/ O" 

ti-ldistlBijr, 

^7 VdistCBt, I) 2 , 



pw 



r.J" 



^- ^- Vdist(J,I) 2 , 

J: 3JCB t 

2 

I ||B l (x-c(B0 



^7 VdistCB,,!) 2 ; 



i=1 



|Bd 2 



o-(i) <yi 2 o-(i) 



Next we turn to estimating term C 2 where the triple of J contains I but J itself does not. Note 
that there are at most two such intervals J of a given length, one to the left and one to the right 
of I. So with this in mind we sum over the intervals J according to their lengths and use (7.9) to 
obtain 

(I) 



c = f y (JJl£ 

tsj^j-jUstajr 

|J|=2 n |I| 







)' 


pW X 

F ' J IJI 



< 



< 



\ |2M| 



w(3 • 2 n I) 



g(I) y- w(3 ■ 2 n I) 

ITI 2— 



n=0 

a(I) 



n=0 



i2 n ir 



a(I) 



P(w,I) U(I) <yi 2 ff(I). 
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The last term D is handled in the same way as term C 2 . The intervals J occurring here are 

1*] 

x 



included in the set of ancestors A k = 71^,1 of I, 1 < k < 00. We thus have 



D = ^P(o--D(c(A k ),|A k ir 



< 



< 



k=1 

oc 

L 

k=1 



|A k 



o(I)' 
lAicI 



er(I) 



k=1 



|A k 



r w(A k ) \ (t(I) 



o-d). 



P(w,I) <^l 2 a(I) 



7.4. The Dual Poisson Testing Inequality. We are considering (7.7). Note that the expres- 
sions on the two sides of this inequality are 



,w ,x|| 2 



t 2 u.(dx,dt) = ^ Y_ \K»\\> 

Jci 



tl\i){x)=Y_ Y. 



I P w -v 1 1 2 
I r F,J x -Hw 



Jci 



|x — Xj| 



2 ' 



Some bookkeeping is in order. We take J £ J7 £ (F) iff J C I, and F is the maximal interval in T 
with J £ i7*(F). Then each interval J is in at most one collection J" e (F). Below, we understand 
that sum over the empty set to be the zero projection. Define 



FeF: jej*(F) 



pw 



These are mutually orthogonal projections, and are indexed solely by J £ V. By the mutual 
orthogonality of the P™j, the quantities above are equal to 



tV(dx,dt) = ^ y. iiQr x i 

FeF jeJ e (F) 



2 

w 1 



th)(x) = Y_ L 



FeFjeJ- e (F) 



+ |X — Xj| 2 * 



We are to dominate ||P* ftl|xj||^ by the first expression above. Expanding the squared norm, 
the diagonal term is 



lIQNIw I 2 

l + |x-xj| 2 _ 



d«J<M,.£ HQ 



J X llw 



where Mi = sup sup 

FeJFjej-sfF) J 



HQMIw 



do\ 



+ |x-xj| 2 ) 2 
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But, by inspection, Mi is dominated by the Ai constant. Indeed, for any J, we have by (7.9) 



II Q7* 



+ |x — Xjl 



2\2 



do < 



w(J) 



+ |x — Xj| 2 ) 



-o(dx) < Ai 



Having fixed ideas, we fix an integer s, and consider those intervals J, J' £ i7 e (F) with 
2~ S |J|. The expression to control is 



T, 



III I 

FeJ c "je 1 7 e (F)F'eJ c ' J'eJ e (F) ° 
F'*F |j'|=2-.|] 



||Qfx 



||QKx 



+ |x-xj| 2 |J'| 2 +|x-c(J')| 2 



da 



<M 2 ^ £ ||Q 

FeJ^ jeJ e (F) 



w ||2 
J x llw 



where M 2 = sup sup 

F'#F |J'|=2- S |J| 



||Q«x| 



+ |x-Xj|2 |J'|2 + |x-c(J')| 2 



do. 



We claim the term M2 is at most a constant times A{2. s . To see , fix J as in the definition of 
M.2, and use (7.9) to estimate the integral on the right by 

2~2s 

-do < A 2 - 



w(J') 



V\2 



|x-xjP |J'| 2 +|x-c(J')| 2 



1 



where n is an integer chosen so that (n 
}' as follows. 

9— 2s 

L L 

F'eJ" J'GJ £ (F') : |J'|=2- S |J| 
(n-1)|J|<dist(J,J')<n|J| 



< dist(J, J') < n|J|. Then estimate the sum over 



1 + n 2 



< 



1 



n 



because the relative lengths of J and ]' are fixed, and each J' is in at most one j7" e (F). This is 
summable over n £ N to 2~ s , so it completes our proof. 

7.5. Proof of Corollary 7.5. Write g F = g\ + g 2 , where g F := 2Zjej(F) ■ j^f Afg. We argue the 
case of g 2 first. The functions {g\ : F £ J 7 } are JVadapted, with the only point not being 
immediate is the technical condition (7.2). Take interval I, and pair (J,F) with J C I C F, and J 
maximal in J7(F). It must be that iij] = F, hence F C 71^1. That is, F can only take at most r 
possible values in T . Hence, this condition holds with p = r. 

This argument only depends upon the functional energy inequality. The argument of the Hilbert 
transform is If, the child of I that contains F. Write I F = F + (If — F), and use linearity of H a . 
Note that by the standard martingale difference identity and the construction of stopping data, 



£ Of (F) , 



F £ T . 



I : I3F 
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Hence, the first term is 

Y_ K A i f ■ < H - F > 9f)w < Y_ ^\( H ^ 9f)w 

TeT I : 13F FeJ* 7 

^W^ar(F)ff(F) 1/2 ||g?|| w . 

This just uses interval testing. Quasi-orthogonality bounds this last expression. 

For the second expression, when the argument of the Hilbert transform is If — ?, first note that 

J2E? F A?f-(I F -F) <0:= ^ a f (F')-F', FgJ. 

I : IaF ?'eT 

Therefore, by the definition of J 7 adapted, the monotonicity property (5.2) applies, and yields 

x . A 

/ w 



^ E^Aff • (H ff (I F — F), g F ) w < Y_ P(Ocr,J)(-Jg F ) , r 6 J, 

I:I2F ]eJ*(T) 



The sum over F G T of this last expression is controlled by functional energy, and the property 
that ||(D|| a < ||f|| a . 

Remark 7.11. The proof above is the paraproduct trick of Nazarov-Treil-Volberg. Applying this 
trick to stopping intervals for f eliminates the need to verify that certain derived measures are 
o"-Carleson measures, a delicate part of the arguments in [8,17,29]. 

We return to the functions g F defined at the beginning of the proof; the Haar supports of these 
functions are 'close to F'. Define functions 



9f := 



9l> 



S G 



F' : 7t^F'=F 



Here, the sum is over F' which are s steps below F in the T tree. It is straight forward to verify 
that the functions {g F +1 } are also JVadapted. Hence, by the argument for g F , there holds 



^^E^f-(FU F ,g F +1 >, 



FG-F I : IaF 



n V2 



It remains to establish the estimate below, uniform over 1 < s < r and F G T . 

Y_ Ef F Aff.(H ff I F ,gf) w <'K^W) yi \\&^. 

I : FglCn^F 

For then, quasi-orthogonality completes the case. The distinction here is that we need not use 
functional energy. 

There is an elementary subcase. In [8, Proposition 2.8], it is shown that for any A > 1, 



sup 

IJ : inj#o 
IJI<A|I|<A 2 |J| 



(FU,J) W < C A HJv(l)w(l). 
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Apply this with A = 2 2r to see that the estimate above holds for the sum below, where the relative 
lengths of I and J are controlled. 

Y_ Y. E? F A?f • <HJ F , Af g F ) w . 

I:FcIC7t^F J:JcF 
|I|<2 2r |J| 

It remains to consider the complementary sum. In this case, we return to the paraproduct trick 
mentioned above. Let T be the intervals F G T with Tt T \' = F. Then, for each F G T, we 
write the argument of the Hilbert transform as I F = F + (I F — F). Interval testing shows that 

IrFsICTtjrF J:JcF 
|I|>2 2r |J| 

satisfies the correct bound. For the second term, use the monotonicity property to see that 



Y_ Y. Ef F Aff-(H a (I F -F),Afg|) w < P(aF, J)F_(w, J)||Pf g 



S I 

fIIw > 



I:FgIC7r^F J:J£F PeJ r JeJ(F') 

|I|>2 2r |J| 

where J{^') is the maximal intervals J in the Haar support of g F / so that there is an I D F with 
2 2t |J| < |I|. Then, P™ — Y_y ■ j/ c j AjV. Cauchy-Schwarz and the energy inequality (5.3) concludes 
this estimate. 
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